Model Selection

ViT Architecture

# ViT Architecture

Vitmodel Skincheck

This is a Vision Transformer-based model for classifying facial skin types into 5 categories.

Image Classification

Transformers English

Coco Instance Eomt Large 1280

This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.

Image Segmentation

Ade20k Panoptic Eomt Giant 1280

This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, revealing ViT's potential in image segmentation tasks.

Image Segmentation

Ade20k Panoptic Eomt Large 1280

This paper proposes an image segmentation model based on Vision Transformer (ViT), revealing the potential of ViT in image segmentation tasks.

Image Segmentation

Coco Panoptic Eomt Large 1280

This paper proposes a novel perspective by treating Vision Transformer (ViT) as an image segmentation model and explores its potential in image segmentation tasks.

Image Segmentation

Coco Panoptic Eomt Large 640

This model reveals the potential of Vision Transformer (ViT) in image segmentation tasks by adapting its architecture for segmentation purposes.

Image Segmentation

Coco Instance Eomt Large 640

This paper proposes a method to reinterpret Vision Transformer (ViT) as an image segmentation model, demonstrating ViT's potential in image segmentation tasks.

Image Segmentation

Coco Panoptic Eomt Giant 1280

By rethinking the architecture of Vision Transformer (ViT), this model demonstrates its potential in image segmentation tasks.

Image Segmentation

A fine-tuned model based on Vision Transformer (ViT) architecture for classifying chest X-rays, trained on the CheXpert dataset.

Image Classification

Transformers English

C-RADIOv2 is a visual feature extraction model developed by NVIDIA, offering multiple size versions suitable for image understanding and dense visual tasks.

Fairface Age Image Detection

An image classification model based on Vision Transformer architecture, pre-trained on the ImageNet-21k dataset, suitable for multi-category image classification tasks

Image Classification

Plant Identification Vit

A plant identification model fine-tuned based on Google Vision Transformer (ViT) architecture, achieving 80.96% accuracy on the evaluation set

Image Classification

Vit Base Patch32 Clip 224.laion2b E16

Vision Transformer model trained on the LAION-2B dataset, supporting zero-shot image classification tasks

Image Classification

Dust3r ViTLarge BaseDecoder 512 Dpt

DUSt3R is a model for easily achieving geometric 3D vision from images, capable of reconstructing 3D scenes from single or multiple images.

Dust3r ViTLarge BaseDecoder 512 Linear

DUSt3R is a deep learning model for generating 3D geometric models from images, capable of easily handling geometric 3D vision tasks.

Dust3r ViTLarge BaseDecoder 224 Linear

DUSt3R is a model for easily achieving geometric 3D vision from images, capable of reconstructing 3D scenes from single or multiple images.

Cvlface Adaface Vit Base Kprpe Webface12m

Face recognition model based on keypoint relative position encoding, using ViT architecture and trained on the WebFace12M dataset

Transformers English

Finetuned Clothes

A clothing classification model fine-tuned based on Google's ViT model, supporting image classification for 7 clothing categories

Image Classification

Skin Cancer Image Classification

Vision Transformer (ViT)-based skin cancer image classification model capable of identifying 7 types of skin lesions

Image Classification

Vogue Fashion Collection 15

A fashion collection classification model fine-tuned based on Google Vision Transformer (ViT), capable of recognizing clothing collections from 15 top fashion brands.

Image Classification

Deepfake Vs Real Image Detection

An image classification model based on Vision Transformer architecture, used to detect real images versus AI-generated fake images.

Image Classification

Organoids Prova Organoid

This model is a fine-tuned image classification model based on Google's ViT-base-patch16-224 on an image folder dataset, achieving an accuracy of 85.76% on the evaluation set.

Image Classification

Driver Drowsiness Detection

Driver fatigue detection model based on ViT architecture, fine-tuned on the UTA RLDD dataset with an accuracy of 97.5%

Image Classification

Clip Vit Large Patch14 Finetuned Fruits 360 Vitlarge

High-precision fruit image classification model fine-tuned on the Fruits-360 dataset based on CLIP ViT-Large

Image Classification

Helicopters Vit

A helicopter image classification model based on the Vision Transformer architecture, capable of identifying different types of helicopters

Image Classification

A ViT model fine-tuned on the preprocessed 1024 configuration dataset for image classification tasks

Image Classification

Hq Fer2013notestaugm

A fine-tuned image classification model based on ViT architecture, excelling on the FER2013 dataset

Image Classification

Large Algae Vit Rgb

This model is a vision model based on the Vision Transformer (ViT) architecture, focusing on the classification task of algae images.

Image Classification

A model fine-tuned based on facebook/deit-small-patch16-224, with no specific use case clearly stated

Image Classification

Google Vit Base Patch16 224 Cartoon Emotion Detection

A fine-tuned cartoon image emotion classification model based on Google Vision Transformer (ViT) architecture, achieving 88% accuracy on the test set

Image Classification

An image classification model fine-tuned on the beans dataset based on Google's ViT model, achieving an accuracy of 96.99%

Image Classification

Vit Large Patch32 224.orig In21k

An image classification model based on Vision Transformer (ViT) architecture, pretrained on the ImageNet-21k dataset, suitable for feature extraction and fine-tuning scenarios.

Image Classification

Vit Base Patch16 224 In21k Finetuned Cifar10 Album Vitvmmrdb Make Model Album Pred

A Vision Transformer (ViT) based model fine-tuned on the CIFAR-10 dataset for image classification tasks

Image Classification

Ast Finetuned Audioset 12 12 0.447

An Audio Spectrogram Transformer (AST) fine-tuned on the AudioSet dataset, using ViT architecture to process audio spectrograms, achieving excellent performance on multiple audio classification benchmarks.

Audio Classification

Vit Base DogSick

A visual classification model fine-tuned based on Google's ViT base model, suitable for domain-specific image recognition tasks

Image Classification

Vit Base Patch16 384 Wi3

Fine-tuned model based on Google Vision Transformer (ViT) architecture, suitable for image classification tasks

Image Classification

Yolos Small Rego Plates Detection

A small vision Transformer model based on the YOLOS architecture, fine-tuned specifically for license plate detection tasks

Object Detection

Yolos Small Finetuned Masks

A small-scale Vision Transformer model based on YOLOS architecture, fine-tuned specifically for mask detection tasks, trained on COCO and mask detection datasets

Object Detection

This is an image classification model based on the Vision Transformer architecture, achieving an accuracy of 95.02%.

Image Classification

This model is an image classification model fine-tuned on an image folder dataset based on facebook/deit-tiny-patch16-224

Image Classification

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase